Hive User Defined Functions
This article provides examples of commonly used User Defined Functions (UDFs) for writing Queries with the Hive Execution Type.
For more information on the Hive tables available within Loyalty, see Hive Tables.
Note: For more information on Hive UDFs, see the Apache Hive user documentation.
in_period()
This function check if the given date is within the evaluated period based on the given expression.
Parameters
-
Date: The date to be evaluated.
-
String: The time period expression that will be converted to a start date and end date (see Time Period Expressions below for details on valid time periods.
Return
-
Return data type is Boolean
-
Returns true if the date in question is within the start and end date, false otherwise.
Usage
-
in_period(<timestamp>,’<expression>’)
Example
SELECT count(*)
FROM ACTIVITY_TABLE
WHERE in_period(sl_activity_ts,'last1m')=TRUE
You can add EXPLAIN to the beginning of the Query to get the actual start and end dates evaluated by the function.
When used in WHERE clauses, be sure to always provide a right-hand side value.
age()
This function computes the age of the given date with respect to the current date.
Parameters
-
Date: The date whose age is to be evaluated.
-
String: Possible values: year, month, day, or hour. Defaults to year if given value is invalid.
Return
-
Return data type is Integer
-
Returns the age of the given date with respect to the specified time unit.
Usage
- age(<timestamp>,’<year|month|day|hour>’)
Example
SELECT count(*)
FROM MEMBER_TABLE
WHERE age(member_since,'month')< 1
get_age_group()
This function determines the age group of the given date using a standard set of age groups (see below for details on how to create custom age groups).
Parameters
-
Date: The date whose age group will be evaluated.
Return
-
Return data type is String
-
Returns the age group of the given date
-
The standard age groups are as follows:
-
<15
-
15 - 24
-
25 - 34
-
35 - 44
-
45 - 54
-
55 - 64
-
65+
-
Other
-
Usage
-
get_age_group(<timestamp>)
Example
SELECT get_age_group(birthdate) AS "Age Group",
count(*)
FROM MEMBER_TABLE
GROUPBY "Age Group"
get_age_group() - extended
This function determines the age group of the given date, using configurable age groups and group labels.
Parameters
-
Date : The date whose age group will be evaluated.
-
String: Possible values: year, month, or day. Defaults to year if given value is invalid.
-
Decimal Array : Contains the threshold values (exclusive) in order from lowest to highest.
-
String Array: Contains the group labels corresponding to the same ordering as in the Decimal Array parameter. This array should always have one more element than the Decimal Array, to account for ages greater than the last value in the Decimal Array (for example: "65+").
Return
-
Return data type is String
-
Returns the age group label of the given date
Usage
-
get_age_group(<timestamp>,’<year|month|day>’, array(int), array(string))
Example
SELECT get_age_group (birthdate, 'year',
array(15, 25, 35, 45, 55, 65),
array('<15', '15-24', '25-34', '35-44', '45-54', '55-64', '65+'))
AS `Age Group`, count(*)
FROM MEMBER_TABLE
GROUP BY get_age_group(birthdate, 'year',
array(15, 25, 35, 45, 55, 65),
array('<15', '15-24', '25-34', '35-44', '45-54', '55-64', '65+'))
get_engagement_level()
This function matches the given integer or count to an engagement level, using a standard set of levels
Parameters
-
Integer: Activity count
Return
-
Return data type is String
-
Returns the engagement level
-
The standard engagement levels are as follows:
-
0
-
1 - 4
-
5 - 9
-
10 - 14
-
15+
-
Usage
-
get_engagement_level(<activity count>)
Example
SELECT current_tier AS "Tier",
get_engagement_level(count(*))
FROM MEMBER_TABLE
GROUP BY"Tier"
sl_sign()
Based on the earn type (earn, expire, or redeem), this function returns either a positive or negative multiplier.
Parameters
-
String: The earn type
Return
-
Return data type is Integer
-
Returns 1 if the earn type is earn; returns -1 if the earn type is expire or redeem; returns 0 otherwise
Usage
-
sl_sign(earn_type), or
-
sl_sign(‘<earn type>’)
Example
SELECT SUM (sl_sign(earn_type)*%PrimaryMetric) AS "Balance"
FROM ACTIVITY_TABLE
get_region()
This function returns the region based on the mailing state, using a set of standard regions (US only).
Parameters
-
String: The mailing state.
Return
-
Return data type is String
-
Returns the region of the given mailing state
-
The standard regions are:
-
West
-
Southwest
-
Southeast
-
Midwest
-
Northeast
-
Usage
-
get_region(mailing_state), or
-
get_region(‘<state>’)
Example
SELECT get_region(mailing_state) AS "Region",
count(*)
FROM MEMBER_TABLE
GROUP BY"Region"
get_all_time_balance()
This function takes a decimal array with length of 3 as the parameter. It computes the all-time Metric balance, assuming that the first element of the array is the all-time Metric earn value, the second element is the all-time Metric redeem value, and the third element is the all-time Metric expire value.
Parameters
-
Decimal Array: The array from which to compute the all-time Metric balance.
Return
-
Return data type is Decimal
-
Returns array[0] - array[1] - array[2]
Usage
-
get_all_time_balance(<metric>)
Example
SELECT SUM(COALESCE
(get_all_time_balance
(array(all_time_metrics_earned["point"],
all_time_metrics_redeemed["point"],
all_time_metrics_expired["point"])),
0))AS"Balance"
FROM MEMBER_TABLE
format_timestamp_string()
This function formats a timestamp to: yyyy-MM-dd'T'HH:mm:ss.SSS
Parameters
-
String: A timestamp formatted string.
Return
-
Returns yyyy-MM-dd'T'HH:mm:ss.SSS formatted string.
-
If the timestamp format is unrecognized, the input string is returned.
Usage
-
format_timestamp_string('2015-04-11T10:30:45.123+08:00')
Example
SELECT cast(format_timestamp_string(member_since) as timestamp) as member_since
FROM MEMBER_TABLE
get_number_group()
This function computes the grouping of the given number column (can be decimal type), with support for configurable groupings and group labels. This function is similar to get_age_group except this one supports numbers, and accepts only 3 parameters.
Parameters
-
Number: The number column that will be evaluated against the groupings.
-
Decimal Array: Contains the threshold values (exclusive) of the groupings, in order from lowest to highest.
-
String Array: Contains the group labels corresponding to the same ordering as in the Decimal Array parameter. This array should always have one more element than the Decimal Array, to account for numbers greater than the last value in the Decimal Array.
Return
-
Return type is String.
-
Returns the group label for the grouping to which the number column belongs.
Usage
-
get_number_group(<number>, array(int), array(string))
Example
SELECT get_number_group(total_score, array(100, 200, 300, 400, 500),
array('Regular', 'Premium', 'Silver', 'Gold', 'Diamond', 'Elite')) AS `Membership Category`
FROM MEMBER_TABLE
sum_array()
This function takes an array of numbers, and computes the sum of the elements of the given array.
Parameters
-
Number Array: The array of values to be summed together.
Return
-
Return type is Decimal.
-
Returns the sum of the elements of the given array.
Usage
-
sum_array(<array>)
Example
SELECT member_id,
sum_array(map_values(all_time_activities))
FROM MEMBER_TABLE
same_array()
This function takes two arrays (of primitive elements) and determines if they are equal, using an optional Boolean flag to indicate if the order of the elements should be ignored or not.
Parameters
-
Array / List of primitive elements
-
Array / List of primitive elements
-
Optional Boolean flag:
-
True: Ignore order of elements
-
False: Include order of elements
-
Null / not provided: Defaults to False.
-
Return
-
Return type is a Boolean.
-
Returns True if the two arrays contain the same elements with order of elements ignored / included; otherwise, returns False.
-
If first parameter, or second parameter, or both is null, returned value defaults to False.
Usage
-
same_array(<array>, <array>, [optional] <boolean>)
Example
SELECT * FROM MEMBER_TABLE
WHERE same_array (list_of_string_attr["browsing_preference"],
array("TV Series"), true) = TRUE
is_subset()
This function compares two arrays (of primitive elements), and determines whether the first array parameter is a subset of the second array parameter.
Parameters
-
Array / List of primitive elements: Array to be checked if it is a subset of the second array.
-
Array / List of primitive elements: Reference array against which the first array is checked.
Return
-
Return type is a Boolean.
-
Returns True if the first array is a subset of the second array; otherwise, returns False.
-
If first parameter, or second parameter, or both is null, returned value defaults to False.
Usage
-
is_subset(<arrayToCheck>, <referenceArray>)
Example
SELECT * FROM MEMBER_TABLE
WHERE is_subset(array("TV Series", "Anime Series"),
list_of_string_attr["browsing_preference"]) = true
Time Period Expressions
-
MM/dd/yyyy-MM/dd/yyyy: Period between the first date and the second date, inclusive
-
MM/dd/yyyy or MM/dd/yyyy: Period since first date
-
alltime: all time
-
ytd: Year to date
-
mtd: Month to date
-
wtd: Week to date
-
last<n>d: last n days (for example: last5d)
-
last<n>w: last n weeks (for example: last3w)
-
last<n>m: last n months (for example: last2m)
-
prev<n>d: previous n days, used for period-to-period comparison (for example: prev5d is the 5 days before the last 5 days).
-
prev<n>w: previous n weeks, used for period-to-period comparison (for example: prev5w is the 5 weeks before the last 5 weeks).
-
prev<n>m: previous n months, used for period-to-period comparison (for example: prev5m is the 5 months before the last 5 months).
Notes: All times are 12:00 AM midnight. Dates are in Coordinated Universal Time (UTC.)